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^ Abstract 
^ . . . 

^ For a given set C of species and a set T of triplets on £, we want to construct a phylogenetic 

'""5 network which is consistent with T, i.e which represents all triplets of T. The level of a network 

CO is defined as the maximum number of hybrid vertices in its biconnected components. When T 

is dense, there exist polynomial time algorithms to construct level-0, 1, 2 networks [Tj l9l ITOl HT] . 

For higher levels, partial answers were obtained in 18 with a polynomial time algorithm for 

simple networks. In this paper, we detail the first complete answer for the general case, solving 
p I a problem proposed in |10j and |17| : for any k fixed, it is possible to construct a minimum 

level-A: network consistent with T, if there is any, in timeO(|r|'=+^nLf J+i)g 

Keywords: phylogenetic networks, level, triplets, reticulations. 

. 1 Introduction 

> 

\Q The goal of phylogenetics is to reconstruct plausible evolutionary histories of currently living or- 

ganisms from biological data. To describe evolution, the standard model is a phylogenetic tree in 
which each leaf is labeled by a species, or a sequence and in which each node having descendants 
represents a common ancestor of its descendants. However this model is not pertinent for capturing 
the hybridization, recombination and lateral gene transfer events. So a new model of network was 
^-H introduced, which allows a species to have more than one parent, see [2]. In recent years, a lot of 

work has been done on developing methods for computing phylogenetic networks \14:\ [71 [TT| [T2] . 
^ ^ In [3] a parameter was introduced for phylogenetic networks, which is the number of hybridization 

^ nodes per biconnected component and called the level. The level of a network measures its distance 

H to a tree. 

It is always difficult to reconstruct the evolution on all data set, so normally it is done on only 
smaller data. Therefore, it is necessary to recombine them together into one model. A triplet is 
the smallest tree that contains information on evolution, so a classic problem is to recombine a 
set of triplets. If there is no constraint on the triplet set, the problem of constructing a level-A; 
phylogenetic network consistent with a triplet set is NP-hard for all levels higher than ^ ITTl [T9] . 
However if the triplet set is dense, that is if we require that there is at least one triplet in the data 
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for each three species, then the species set is better structured and then it is possible to construct 
a level-1 [HIIIO], or a level-2 [17] network, if one exists, in polynomial time. The following question 
was first asked in [lOj: Does the problem remain polynomial for level-fe network for a fixed k? We 
present here an affirmative answer to this question. Our preliminary version in [15] proved that we 
can construct a minimum level-A; network in time 0(|T|'^'^^n'^'^^^). In this version, we present an 
improved result with a complexity of OdTl^^^n^^^^^). As a consequence, it is possible to find a 
network with the minimum level in polynomial time if the minimum level is restricted. It means 
that the complexity is a polynomial function with the power of the minimum level. 



Related works: 

[1] presented an 0(|T|.n)-time algorithm for determining whether a given set T of triplets on n 
leaves is consistent with some rooted, distinctly leaf-labeled tree, i.e. a level-0 network, and if so, 
returning such a tree. Later, improvements were given in [6l |8]. But the problem has been proved 
to be NP-hard for all other levels [9} I17t [19]. Similarly the problem of finding a network consistent 
with the maximum number of triplets is also NP-hard for all levels [Qj [19]. The approximation 
problem which gives a factor on the number of triplets that we can construct a network consistent 
with, has been also studied in [3] for level-0, level-1, and level-2 networks. 

Concerning the particular case of dense triplet sets, there are following results. For level-1, 
[9] give an 0(|7~|)-time algorithm to construct a consistent network, and [18j gives an 0(n^)-time 
algorithm to construct a consistent one with the minimum number of reticulations. For level-2, [17| 
gives an 0(|T|3)-time algorithm to construct a consistent network, and [18] presents an 0(n^)-time 
algorithm to construct the consistent one with the minimum number of reticulations. For level-A; 
networks with any fixed k, there is only a result constructing all simple networks in 0(|T|'^'''^)-time 
|18] . Recently, in [16| it was proved that when the level is unrestricted, the problem is NP-hard. 
Besides an interesting recursive construction of level-A; phylogenetic networks was proposed in [5]. 
Moreover the problem of finding a network consistent with the maximum number of triplets is 
NP-hard for all levels [l9]. 

There are also studies on the version of extremely dense triplet sets, that is when T is considered 
to contain all triplets of a certain network. In this case, an 0{\T\^^^) time algorithm was given in 
[T8| for level-A networks. 



2 Preliminaries 

Let us recall here some useful definitions also used in [H [9l HOl [E] • Let £ be a set of n species or 
taxa or sequences. 

Definition 1 A phylogenetic network N on C is a connected, directed, acyclic graph which has: 

- a unique vertex of indegree and outdegree 2 (root). 

- vertices of indegree 1 and outdegree 2 (speciation vertices). 

- vertices of indegree 2 and outdegree 1 (hybrid vertices or reticulation vertices). 

- n vertices labeled distinctly by C of indegree 1 and outdegree (leaves). So C is also called the 
leaf set. 

For sake of simplicity, we do not show the direction of arcs in the figures. By convention, arcs 



are always directed away from the root. See Figure [1(a) for an example of a phylogenetic network 
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on C = {a, b, . . . J} 




For any two vertices u, v of N, we denote u ^ v if u,v are distinct and there is a path in 
from u to V. In this case, we say that u is above v or equivalently v is below u. If u is either below 
or above v then u, v are comparable. Given two paths pi : ui'^ vi and p2 ■ U2 f 2 such that ui 
is not on p2, U2 is not on pi and pi, p2 have common vertices, if h is their highest common vertex, 
then h must be a hybrid vertex. We say that pi,p2 intersect at h. 

Denote by u -» v for a path from u to w which does not contain any hybrid vertex below u and 
above v, if such a path exists. 

Let U{N) be the underlying undirected graph of A^, obtained by replacing each directed edge 
of N by an undirected edge. 

Definition 2 /^/ ^ phylogenetic network N is a level-k phylogenetic network iff each block of 
U{N) contains at most k hybrid vertices. 



The network in Figure 1(a) is of level-2. It is easy to see that is a level-0 phylogenetic network 
iff is a phylogenetic tree. 

The block oiti[N) that contains the vertex corresponding to the root of A^ is called the highest 
block. By abuse, we call the subgraph of A^ which induces this block the highest block of A^. In 



Figure 1(a), the highest block is in bold. Denote by % the set of the hybrid vertices contained in 
the highest block of A^, so \1-L\ < k. 

Each arc of A^ whose removal disconnects A^ is called a cut-arc . A cut-arc (li, v) is highest 
if there is no cut-arc («', v') such that v' ^ u. It can be seen that a highest cut-arc always has its 
tail in the highest block. 

A phylogenetic network is simple if each of its highest cut-arcs connects a vertex of the highest 
block to a leaf. Figure l(b)| represents a simple level-2 network. 



Definition 3 A triplet x\yz is a rooted binary tree whose leaves are x, y, z such that x, and the 
parent of y and z, are children of the root. A set T of triplets is dense if for any set {x, y, z} C C, 
at least one triplet on these three leaves belongs to T. 
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A triplet x\yz is consistent with a network N if it is 'in' this network, i.e N contains two 
vertices p ^ q and pairwise internal vertex- disjoint paths p ^ x, p q, q ^ y, and q^ z. 

If a triplet is consistent with a network, we says also that this network is consistent with the 



triplet. For example, a\hc is consistent with the network in Figure 1(a) , but b\ac is not. 

A network is consistent with a set of triplets iff it is consistent with all triplets in this set. 

For the sake of simplicity, in the following it is always assumed that T is a dense triplet set, 
and we will consider the following problem : 



Main Problem 

data: A dense triplet set T and a fixed integer k. 
research: A level-A; phylogenetic network consistent with T. 



We call a level-A; network consistent with T having the minimum number of hybrid vertices a 
minimum level-A; network consistent with T- 

Let L be a subset of the C. The restriction of T to L is denoted by T\L = {x\yz £ T such that 
x,y, z G L}. Let "P be a partition of £: V = {Pi, . . . , Pm}- Denote TW = {Pi\PjPk such that 
3 x £ Pi,y £ Pj, z G Pk with x\yz G T and i, j and k are distinct} . 

For each network A'^, by removing the highest block and the highest cut-arcs, we obtain several 
vertex-disjoint subnetworks A^i, . . . , A^m- Each one is hung below a highest cut-arc. If in A^, we 



replace each Aj by a leaf, then we have a simple network called Ng (Figure 1(b)). Let l{Ni) be 
the leaf set of Aj, so a l{Ni) is called a leaf set below a highest cut-arc. It is easy to see that 
P(N) = {^(A^i), . . . , ^(A'm)} is a partition of C. We can use biconnectivity to decompose our 
problem as described in |17j . 

Lemma 1 Decomposition lemma N is a level-k network consistent with T iff each Ni is a 
level-k network consistent with T\l{Ni) for any i = 1, . . . ,m and Ns is a simple level-k network 
consistent with T'VP{N). 

Constructing a simple level-A; network consistent with TVP(A^), if such a one exists, can be 
done in polynomial time using [18]. Therefore the main difficulty if we want to derive from this 
lemma a divide and conquer approach is to estimate the number of partitions that have to be 
checked. Fortunately the search can be restricted to a polynomial number of partitions and to this 
aim further definitions and technical lemmas are developed in the next sections. 



2.1 SN-sets 

Remark that if yl is a leaf set below a cut-arc, i.e. a part of P{N), then for any z £ C\A, x,y G A, 
the only triplet on {x, y, z} that can be consistent with the network is z\xy. Based on this remark, 
we define a family of leaf sets, called CA-sets, for CutArc-sets, as follows. 

Definition 4 Let A C C, then A is a CA-set if either it is a singleton or the whole C, or if it 
satisfies the following property: For any z G C\A, x,y G A, the only triplet on {x, y, z} in T, if 
there is any, is z\xy. 

As noticed above, each part of P{N) is a CA-set, but the converse claim is not always true. 
Let us recall that |10] introduced a variation of these CA-sets by a closure operation, namely the 
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notion of SN-set. In [15], we showed the equivalence between these two definitions. Therefore, the 
family of SN-sets is exactly the family of CA-sets and we will stick to the notation of SN-set for 
any CA-set determined by Definition |4j 

It was proved in \JQ\ that if T is dense, then the collection of its SN-sets is a laminar family 
|13j . It means that 2 SN-sets are either disjoint or one of them contains the other, hence this family 
can be represented by a tree when considering inclusion. This tree is called SN-tree , its root 
corresponds to C, and the leaves correspond to the singletons. Moreover each node v of the SN-tree 
represents an SN-set made up with the leaves of the subtree rooted in v. Let si, S2 be two SN-sets. 
We say that si is a child of S2, or S2 is a parent of si, if in the SN-tree, the node representing si 



is a child of the node representing S2- For example, in the SN-tree of Figure [2(a) the SN-set {d, e} 



is a child of the SN-set {a, b, c, d,e}. A non trivial maximal SN-set is a child of C To simplify the 
notation, we call such a set a maximal SN-set . 



Take for example the SN-tree in Figure [2 (a) The SN-set {/, k, h, g, j, 1} has two children {/, k} 



and {h,g,j,l}. There are two maximal SN-sets which are {a,b,c,d,e} and {i, f,k,h,g,j,l}. 
2.2 Split SN-sets 

Definition 5 Let N be a network on C consistent with T, and let S be an SN-set of T different 
from C We say that S is split in N iff each child of S is equal to a part of P{N). In other words, 
each child of S is the leaf set below a highest cut-arc of N , or a certain l{Ni). 



Example 1 For example, suppose that T has the SN-tree in Figure 2(a). T can be the set of all 



possible triplets on the leaf set C = {a, b, . . .k,l} except x\yz such that x, y but not z are in an SN- 



set. It can be verified that the network N in Figure 2(b) is consistent with T ■ N has 9 subnetworks 
Ni which gives us the partition P{N) = {{a, 6, c}, {d}, {e}, {/}, {g},{h}, {i}, {k}, {j,l}}- The 
SN-set {g, h, j,l} is split in this network because each of its children, {g},{h},{j,l}, corresponds to 
a part of P{N). The SN-sets {d,e}, {f,k} are also split here. However, the SN-set {a,b,c,d,e} is 
not split in this network. Indeed, its children are {a,b,c}, {d,e} and the latter one is not equal to 
any part of P{N). All other SN-sets, {a,b, c} ,{j,l} ,{f, g, h, j, k,l} , {f,g,h,i,j,k,l}, the singletons, 
and the whole C, are neither split. In Figure \2(a)\ each white round node corresponds to an SN-set 
which is split in N, each square node corresponds to an SN-set which is a part of P{N) . 

By definition, a tree consistent with T has no split SN-set. 

|10i E] showed that if T is consistent with a level- 1 network, then there exists a level- 1 network 
N consistent with 7" such that each maximal SN-set is a part of P{N). So N has no split SN-set. 

|17j showed that if T is consistent with a level-2 network, then there exists a level-2 network N 
consistent with T in which each maximal SN-set is a part of P{N), except at most one maximal 
SN-set S such that each child of S equals a part of P{N). So, has at most 1 split SN-set. 

For level-k networks, with k > 3, each part of P{N) does not always correspond to a maximal 
SN-set and it can be any SN-set at any level in the SN-tree (its depth in the SN-tree is not bounded 
by a function of k), but the number of split SN-sets is bounded by a linear function of k. Indeed, 
in [15], we proved that a level-fc network consistent with T has at most 3k split SN-sets. In this 
paper, we propose a stricter bound: if T is consistent with a level-A; network, then there is a level-A; 
network A^ consistent with 7~ such that the number of split SN-sets of A^ is bounded by [|A;J. 

It is easy to see that two SN-sets which are both split in A^ are disjoint. An SN-set may be 
split in a network but not-split in another network which is also consistent with the same triplet 
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(a) The SN-tree of T. The white black square nodes (b) A network TV consistent with 
represent the SN-sets that split in A*'. The white T. 
square nodes represent the partition of L in A. 

Figure 2: Example [T] of split SN-set. 



set. Therefore, when we say that S is split, we have to indicate in which network. However, for 
convenience, from now on, when we say is a split SN-set, it means that S is split in A^, a level- A; 
network consistent with T that we are going to construct. 

Lemma 2 The set of split SN-sets of N totally determines P{N) . 

Proof: Suppose that we know the set of all split SN-sets of a network A^, we can determine P{N) 
as follows. Let Pi be a part of P{N). So either Pi is a child of a split SN-set or not included in 
any split SN-set. In the latter case, it is a biggest one that is not comparable (neither included nor 



containing) with any split SN-set. For example, see Figure 2(a) where each split SN-set corresponds 



to a white round node and each part of P{N) corresponds to a black square node. □ 

So, to bound the number of possible partitions of consistent level-/c networks, we will find a 
bound for the number of split SN-sets in a consistent level-A; network. The idea is to find relations 
between the number of split SN-sets and the number of the hybrid vertices. To this aim, some 
functions from a split SN-set to a set of hybrid vertices will be introduced. 



3 Some properties and functions of split SN-sets 

This section explores some properties and functions of split SN-sets which will be used in Section 
[4] to find a stricter bound for the number of split SN-sets in a level-fc network consistent with T- 

A vertex of A^ is a lowest common ancestor, lea for abbreviation, of a split SN-set S if it is the 
lea of all leaves of S. If S has only one lea, then we denote it by lca(S) . Remark that a lea of S 
is never a hybrid vertex. 

Let t be a lea of S, denote by Nt [S] the induced subgraph of A^ consisting of all paths from t 
to the leaves of S, and N[S] = U A^t[5] for all leas t of S. In the figures which describe A^ in the 
following, we represent N[S] by continuous lines and the parts not in N[S] by dotted lines. 

For any SN-set S* of T which is split in A'', denote by si, . . . ,Sm the children of S, i.e. each Si 
is a part of P{N). Let Uj be the tail of the highest cut-arc below which Si is attached to, so Ui 
is on the highest block of A^. Sometimes, we denote n^. instead of Ui when there are more than 
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one split SN-sets that are involved. For any subset f oi JC\S, and children Si,Sj of S, denote 
f\siSj = {x\yz G T such that x E f and y,z E Si(J Sj}. 




3.1 Function a 



Definition 6 For any split SN-set S of N, let: 
a{S) = {h E T-L \ h is in N[S] and 3 a child Si of S such that either 
h = Ui or there is a path h ^ Ui} (a for above) 

Example 2 For example, in the figure on the right, a{S) = 
{ho, h2, U4} because they are in %, in N\S\, and we have the paths 
ho M3, h2 ^ U2- hi is in N[S] but it is not in a{S) because any 
path from hi to any Ui contains at least another hybrid vertex. 

Lemma 3 Let S be a split SN-set of N. 

(i) a{S) = iff N[S] does not contain any hybrid vertex ofH. 
(a) If \a{S)\ < 1 then S has only one lea. 

(Hi) Ifa(S) = 0, then ^ S, there must exist a path from the root to x which is vertex-disjoint 
withN[S]. 

Proof: (i) This claim is inferred directly from the definition of a. 

(ii) Suppose that S has 2 leas, called ti,t2. Let si be a child of S, so there exists a path ti ui 

and a path t2'^ ui. 

Since ti is neither above nor below t2, these two paths must intersect 

at a hybrid vertex above ui. Let hi be a lowest hybrid vertex below I 1 

ti,t2 and above ui, i.e. we have a path hi ui. So hi € a{S) by 

definition. Let S2 be another child of S such that ti, t2 arc leas of si,S2. 

By the same argument, there is a hybrid vertex /12 and a path /12 -» «2, 

i.e. /i2 € a{S). It is evident that hi 7^ /i2 because otherwise hi is 

a lea of si,S2. So a{S) contains at least two hybrid vertices hi,h2, a 

contradiction. 

(iii) Prom the fact that a{S) = 0, we deduce that N[S] does not have any hybrid vertex in H 
and S has only one lea. Let Si,Sj be two children of S such that lca{si,Sj) = lca{S). Because S 
is an SN-set, x\siSj is consistent with N for any x ^ S. So, there exist two vertices p, g of TV such 
that there are 4 internal vertex-disjoint paths p ^ x, p ^ 

We deduce that q = lca{S) because Ui, Uj have only one lea which 
is lca{S). Suppose that the path p^ x has common vertices with 
A''[S'], then this path must pass lca{S) because N[S\ does not have 
any hybrid vertex in "H. It implies that the paths p^ x, q ^ Ui, 
q Uj have lca{S) as a common vertex, a contradiction. So, there 
exist at least a path p^ x which is vertex-disjoint with N[S]. 




q^ Ui and q ^ Uj. 



P,K 



lca{S) = q 




□ 



Lemma 4 For any h gT-L, \a ^{h)\ < 1, i.e. 
by function a. 



a hybrid vertex is assigned to at most one split SN-set 
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Proof: Assume that there exist 2 spht SN-scts X, Y such that h G a{X) fl a{Y). 

By definition, there exists x which is a child of X such that either 

h = Ux oi there is a path h ^ Ux- Similarly we have a child y of 

Y. Let ty be a lea of Y, so is above h and let be another 

child of Y such that lca{y,y') = ty- We sec that any paths from 

a vertex above ty to x and to y must pass h because there is no "x^^'^^Uy 

hybrid vertex on the paths from h to Ux and to Uy. So x\yy' is not ^ 

fx) fy) 

consistent with the network, contradicting Y being an SN-set. WW □ 




3.2 Function b 

Let be a split SN-sct of N such that a{S) = 0, and /i be a hybrid vertex not in N[S] below lca{S). 
We denote lca(S) ^ h for a path from lca{S) to h such that there is a path from the root to h 
which is vertex- disjoint with N[S], and for any hybrid vertex h' above h on this path, every path 
from the root to h' has common vertices with N[S]. In other words, ^ is a highest hybrid vertex 
below lca{S) which has a path coming to it from outside of ^^[5"]. 

Note that if there is a path lca(S) h, i.e. if there is no hybrid vertex different from h on this 
path, then this path is also a path lca{S) ^ h. 

Definition 7 For any split SN-set S of N such that a{S) = 

0, let b{S) = {he'H\J a path lca{S) h} (b for below) 



Example 3 For example in the figure on the right, b{S) = 
{^1,^2} because they are in % and we have two paths 

lca{S) ^ hi, lca{S) /i2- ^0 is not in 6(5) because every 
path from the root to ho has common vertices with N[S]. h-^ 
is not in b{S) because there is only one path from lca{S) to 
h^ but this path contains the hybrid vertex h2 above h^ and 
there is a path from the root to /12 ( the dotted line ) which is 




vertex-disjoint with N[S]. ! 

Lemma 5 V/i G there are at most 2 split SN-sets X, Y such that a{X) = a{Y) = and 
heb{X)nb{Y). 

Proof: Suppose that there are 3 split SN-sets X, Y, Z such that a{X) = a{Y) = a{Z) = and a 
hybrid vertex h G b{X) n b{Y) n b{Z). By definition, h is below lca{X), h is not in N[X] and there 
is a path cx- lca{X) h. Similarly for Y, Z. 

The 3 paths cx,cy, cz pass h, then there are at least two among them, for example cy, cz have 
common vertex above h. We have the following cases: 
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(i) cy and cz intersect at a hybrid vertex h! above h. For 
a certain leaf y of Y ^ by Lemma [3] (iii), there must exist 
a path c from the root to y which is vertex-disjoint with 
A^[Z]. c must pass /ca(y) because does not contain 
any hybrid vertex in H. So lca{Y) is not in N\Z\ because c 
is vertex-disjoint with A'^[2']. h! is neither in 1S}\Z\ because 
h! G &(^). So the path /ca(y) /i' does not have common 
vertices with A^[^]. Hence, the subpath of c from the root 
to /ca(y) extended to h! is vertex-disjoint with iV[Z]. The 
later is a contradiction because cz is a path lca{Z) ^ h. 

(ii) lca{Y) is on cz- Similarly to the above case, there is a 
path c from the root to lca(Y) which is vertex disjoint with 
A/^[Z]. So c intersects with cz at a hybrid vertex h' above 
lca{Y). The subpath of c from the root to h' is also vertex- 
disjoint with A^[^]. The later is a contradiction because cz 
is a path lca{Z) h. 

(iii) Similarly for the case where lca{Z) is on cz- 

Lemma 6 For any split SN-set S of N such that a{S) = and for any x ^ S which is below 
lca{S), let Cx be a path lca{S) x, then Cx contains one and only hybrid vertex in b{S). 

Proof: By Lemma [s] (i), A^[<S'] does not contain any hybrid vertex in Ti. Since x ^ S then by 
Lemma [3] (iii), there must exist a path from the root to x which is vertex-disjoint with A^[S']. 
This path must intersect with Cx at a hybrid vertex h' above x. It means that there is at least one 
hybrid vertex on Cx which has a path coming to it from outside of A^[S']. Let hx be the highest 
hybrid vertex on Cx having this property, then by definition hx G b{S). □ 

3.3 Restricting the searching class 

We introduce here two lemmas [7j [9] which allow us to restrict the research to a class of level-A; 
phylogenetic networks having fewer split SN-sets without losing the ones having the minimum 
number of hybrid vertices. It is a generalisation to level k of Theorem 3 in |17j . 

For any split SN-set S such that a{S) = 0, let us define Fs to be the set of elements (x, y, z) 
where x, y are below lca{S), x,y ^ S, z £ S such that x\yz £ T- 

Lemma 7 Given a level-k network N consistent with T. Let S be an SN-set of T which is split 
in N such that a{S) = and for any {x, y, z) € Fs: 

- either there is a path from the root to x which is vertex- disjoint with a path from lca{S) to y, 

- or there exist 2 vertices p,q in N such that p is above lca{S) and there are 4 internal vertex- 
disjoint paths p q, p X, q-^ y, q ^ z. 

Then, there is a level-k network N' consistent with T , having the same number of hybrid vertices 
as N, in which S is not split but is equal to a part of P{N'). 

Proof: Suppose that S satisfies the condition stated in the lemma. Because a[S) = 0, by Lemma 
[sj S has only one lea. Let Gs be the network obtained from N\S] by contracting all arcs having 
one extremity of in-degree 1 and out-degree 1. So, Gs has one lea, called vs- We construct N' from 
N as follows (see the figure below in which only the part of the network that concerns S is drawn): 
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Delete from all subnetworks on Sj, 
contract all arcs having one extremity 
of in-degree 1 and out-degree 1, add a 
new vertex in the middle of the arc 
coming to lca{S) and then add a new 
arc from us to vs where we attach Gs- 
So {us,vs) is a highest cut-arc of A^', 
i.e S is a part of P{N'). 




Modify 



Because a{S) = 0, there is not any hybrid vertex of T-L in N[S]. So, Gs does not contain any 
hybrid vertex except those in the subnetworks on Sj. It implies that A^' has the same level and 
the same number of hybrid vertices as A^. Now, it remains to show that A^' is consistent with all 
triplets of T. For a triplet x\yz G T, we can distinguish the 6 following cases: 

(1) Since S is an SN-set, the cases x,y £ S and z ^ S or x, z £ S and y ^ S are excluded. 




Figure 3: (2): x,y,z £ S 



(2) x,y,z G S (Figure [3]), so there exist Sj,Sk,si such that x € Sj,y € Sk,z € si {j,k,l are not 
necessarily distinct). By definition of consistency, A^ has 2 vertices p,q and the internal vertex- 
disjoint paths p"^q,p"^x,q"^y,q'^z. Because there is not any hybrid vertex in A^[5], any 
path from a vertex above lca{S) to any leaf of S must pass lca{S). So, p, q can not be above lca{S) 
because otherwise, there are at least 2 among the 4 paths p'^q,p"^x,q^y,q"^z have lca{S) 
as a common vertex. We deduce that p,q are in N[S]. So, x\yz is consistent with A^[5'], or with 
Gs, and then is consistent with A^'. 

(3) x,y,z ^ S. We do not change the configuration of the network except the positions of the 
subnetworks on Sj. So all triplets of this case remain consistent with A^'. 

(4) X ^ S,y, z £ S. In A^', y, z are below the highest cut-arc {us,vs) while x is not. Hence, the 
triplet x\yz is consistent with A^' in this case. 

(5) x G S,y, z ^ S. Let Si be the child of S such that x £ Sj. Let p,q be two vertices of N 
such that there are internal vertex-disjoint paths p ^ x, p ^ q, q ^ y, q z, we will prove that 
lca{S) and p are comparable. Suppose otherwise, then any path p ^ Ui intersects with any path 
lca{S) Uj at a hybrid vertex h' . If h' = Ui then Ui is a hybrid vertex in a{S), a contradiction 
because a{S) = 0. So h' is above Ui. However, h' is below lca{S), then a{S) contains at least one 
vertex, a contradiction. Hence, we have two cases: 

a) If lca{S) is below p (see Figure 4(a)| ), then q can not be below lca{S) because otherwise, 
the paths p^ x and p-^ q have lca{S) as a common vertex, a contradiction. In A^ the path p^ x 
must pass lca{S). So the path p ug in A^' is a subpath of p x in A^. It implies that in N' , 
p Us IS also internal vertex-disjoint with p q, q y and q ^ z. So, x\yz is consistent with 
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(a) lca(S) is below p 



(b) lca{S) is above p 



Figure 4: (5): x £ S,y,z ^ S. 



N' because x is below us- 

b) If lca{S) is above or equal to p (see Figure [4(b) ), because y,z are also below lca{S) 
because they are below p. So, y, z have a lea g below lca{S), i.e below U5 in N' . Moreover, in N' , 
X is hung below the highest cut-arc {us,vs), then any path us ^ x is internal vertex-disjoint with 
Us Q, Q y, Q z. In other words, x\yz is consistent with A^'. 





s: y " ; y 

1 X 

(a) lca{S) is below p (b) lca{S) is above p 
Figure 5: (6): x,y ^ S, z £ S. 



(6) x,y ^ S, z £ S. Let Si be the child of S such that z S Sj. For any two vertices p,q oi N such 
that there are 4 internal vertex-disjoint paths p'^x^p'^q^q-^y^q^z, by the same argument 
as that of the previous case (5), we deduce that lca{S) and p are comparable. 



a) If lca{S) is below p (see Figure 5(a)), then in N' , us is below p. Similarly to the case 
(5a), the path p^ us i^ N' is a subpath of p z in A'^, so it is internal vertex-disjoint with p ■ 



q y in N' . Because in A^', z is below the cut-arc {us,vs), so every path us z is internal 
vertex-disjoint with the paths p x, p ^ q, q^ y, no matter where the position of q is (above or 
below lca{S)). Hence, x\yz is consistent with N' . 



b) If lca{S) is above or equal to p (see Figure 5(b) ), then x, y are also below lca{S) because 



they are below p, then. It means that (x, y, z) S Fs- According to the assumption: 

- either there exist two other vertices p' , q' such that p' is above lca{S) and the internal vertex- 
disjoint path p' X, p' q' , q' y, q' z, then it returns to the case (a). 

- or there is a path Cx from the root to x which is vertex disjoint with a path Cy from lca{S) to 
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y. In this case, Cx in A^' is also vertex disjoint with the path from us to y by using Cy. So, x\yz is 
also consistent with N' . 

We conclude that N' is consistent with all triplets of T. □ 

For any split SN-set S such that a{S) = 0, let F'^ be the subset of Fs containing the elements 
(x, y, z) such that: 

(*) every path from the root to x has common vertices with every path from lca{S) to y, and 
(**) let p, q be two vertices such that there are internal vertex disjoint paths p x, p ^ q, 
q y, q z, then p, q are in N[S] . 

Therefore, S does not satisfy the condition in Lemma [t] iff Fg is not empty. 

Lemma 8 Let S be a split SN-set in N such that a{S) = and F'g ^ 0. For any {x,y,z) € F'g, 
there exist two distinct hybrid vertices h^, hy in b[S) such that is above x, hy is above y, and hx 
is below hy. 

Proof: By (**), there is a vertex p in N\S] and two internal vertex disjoint paths Cx : p^ x, and 
Cy : p ^ y. Let Cx be the path from lca{S) to x containing Cx- Let Cy be the path from lca{S) 
to y containing Cy. By Lemma [6| there is a hybrid vertex hx of b{S) on Cx, namely hx- Because 
hx N[S], p £ N[S], so hx is below p, i.e. hx is on Cx- Similarly, there is a hybrid vertex hy of 
b{S) on Cy. hx / hy because Cx,Cy are vertex-disjoint. 

We will prove that hx is below hy. Let c'^ be the subpath of Cx from hx to x, and c'y be the 
subpath of Cy from hy to y. Because lca{S) ^ hx, there is a path, called C, from the root to hx 



which is vertex-disjoint with A^[5]. 

Let C'x be the path from the root to x con- 
sisting of C" and 4. By (*), C^, must \ca(si lca(sJ i^- 
have common vertices. So, they must in- ^^^\ ■' ^^^\ -'' 
tersect at a hybrid vertex h' because C'x ^\ /C ,\ 
does not pass lca{S), while x is below / ' xL /' /'^ '' ' \_ 
lca{S). If h' is below hy, then hy is above / /' ,/© / "''h' /' © 
hx because h' is above hx. So we are /@ V /' ; /'•. / 
done. Suppose that /i' is above hy. Be- / / .---fh' ; ; ^ 
cause hy G 6(5"), by definition every path ^y.\ Cy| h^r Cy I 

from the root to h! must have common \^ \ \^ \ 

I y ' y 

vertices with A^[S']. So, C has common ^ ^ 

vertices with N\S\ , a contradiction. ^elow hy h' is above hy 

Hence, hx is below /i„. □ 



Lemma 9 Let N be a level-k network consistent with T , let S be an SN-set of T which is split in 
N such that a{S) = and \b{S)\ < 2. Then there is a level-k network N' consistent with T, having 
the same number of hybrid vertices as N, in which S is not split but equal to a part of P[N'). 

Proof: If S satisfies the conditions in Lemma [7j then we are done. Suppose that S does not satisfy 
the conditions in Lemma [7| so F'^ is not empty. By using Lemma [s] with a certain element (x, y, z) 
of F'g, we deduce that h{S) contains at least 2 hybrid vertices. So \b{S)\ = 2. Denote by hi, /12 the 
two hybrid vertices of b{S). Also by Lemmajsj hi, /12 are comparable. Suppose that /iq is below hi, 
then for any (x, y, z) G F'g, hx = hi, hy = /12, where hx, hy are defined as in the proof of Lemmalsl 



12 



We construct Gs and modify by the same 
way that we did in the proof of Lemma [7j How- 
ever, the position of us below which we hang 
Gs will be chosen differently. Let po be a vertex 
of N[S] such that there are two internal vertex 
disjoint paths po ^ hi and po /i2- There ex- 
ists always such a po, for example we can choose 
Po = P which is defined in Lemma [8] for a certain 
element {x,y,z) S F'^. us is put in the middle 
of the arc going from p on the path p ^ h2- 
Denote the obtained network by N' . 
It is easy to see that A^' has the same level and the same number of hybrid vertices as N. We 
must check that all triplets x\yz of T are consistent with A'^'. It can be verified that the proof of 
Lemma [7] still holds here for all triplets except the cases (56) and (66). Let p,q be two vertices of 
such that there exist 4 internal vertex disjoint paths p ^ q, p ^ x, y, q ^ z. 
(56) X ^ S,y, z ^ S and p is in A^[5], i.e. y,z are below lca{S). By Lemma [6| any leaf below 
lca{S) and not in 5 must be below a hybrid vertex of b{S). Moreover, 6(5) contains only /ii,/i2 
and hi is below h2- So both y, z are below /12. Hence, there exists a lea q' of y, z which is below /i2. 
Furthermore, us is above /i2 then in A'^' there are 4 internal vertex-disjoint paths us ^ x, us ^ q' , 
q' y and q' z, 




Modify A^ 



i.e x\yz is consistent with A^' (Figure 6(a)) 



lca{S) 





')lca{S) 





(a) Case (5fe): x £ S, y,z <^ S 



(b) Case (66): x,y ^ S, z e S 



Figure 6: x\yz is consistent with A^' 



(66) x,y ^ S, z £ S and p is in A^[S'], i.e. x, y are below lca{S). Then, each triplet x\yz in this 
case corresponds to an element (x, y, z) of Fs- 

- If {x,y,z) £ Fs \ F'g, then it satisfies the properties in Lemma [7| We can use the same 
argument as that in the proof of Lemma [7] to prove that x\yz is consistent with A^'. 

- Otherwise, {x, y, z) G F'g, so hi = h^, /12 = hy. In other words, x is below hi and y is below 
/i2. By construction, us is added on the path Pq-^ h2, so in A^' there are 4 internal vertex-disjoint 



paths Po ^ ^1 X, Po Us, Us ^ h2 ^ y and us ^ z (Figure |6(b) ). Hence, x\yz is consistent 
with N'. □ 



Using Lemmas [7j [9] without loss of the networks having the minimum number of hybrid vertices, 
we can restrict the research on the networks A^ such that: each split SN-set S of N having a{S) = 
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does not satisfy the 2 conditions in these Lemmas. In the following, we use only one of these 
two conditions, i.e. if a(S) = 0, then |b(S)| > 3. 

4 A bound on the restricted networks class 



As concluded in Section 3.3, without loss the level-A; networks having the minimum number of 
hybrid vertices, we can suppose that the constructing level-A; networks having the following 
property: for any split SN-set S of N, if a{S) = 0, then \b{S)\ > 3. 

Let S be the set of SN-sets of T that are split in N. We will bound |5| by a stricter linear 
function of k. To this aim, the functions a, b defined in Section |3] and another function t defined 
in the next will be explored. We will introduce some lemmas showing some properties of each 
function which allow us to establish the relation between the number of hybrid vertices in T-L and 
the cardinality of S. 



4.1 Partition T-L and S by the functions a,b 

Using the function a, we partition Ti and S into several subsets: (Figure [7]!|a)). 
-n^ = {h£ n\a-\h) = 0}, and Sg = {S £ S\a{S) = 0}. 

- For any i > 1, = {5 E 5| |a(S')| = z}, so all 5" are pairwise disjoint. 

- HI is the image of Sf by the function a. By Lemma [4| all are pairwise disjoint. 




(a) \Ht\=ACil'^i>l 



|6|>3 

(b) 3|5^| < \n\\ + 2\n\\ 



/ < 2 , 


'H^n'H\ 








(c) infnn^^l 

<2|7^gn7^5l + |^^2n^?| 



Figure 7: The 3 functions a, b, t and their properties. The set of hybrid vertices: Ti = L) Tif 
n\u nl The set of split SN-sets: 5 = U 5f . 



Lemma 10 \S\ < k + \Sg\ - \n^\ - l\n%2\ 



Proof: By definition, all Hf are pairwise disjoint, and {T-Lfl = z|5?| for any i > 1. 
Then: |5| = \SS\ + \Sf\ + E.>2 l^f I = \SS\ + |^?| + Ei>2 ll^fl- 
Furthermore, |^| = + \nf\ + J2i>2 I'^il < ^ 

So, \s\<k + \ss\ - m - Y.^>2 < k + |5ol - m - \\nu □ 
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Then, in order to bound it remains to determine the relations between |5g |, \Hq\ and |'H>2l- 
Due to Lemma [sj we can use the function h to partition % into 2 subsets (Figure [7][^b)): 
T-L\ = {h ^'H\ there is at most one spht SN-set X of Sq such that h G h{X)}. 
nl = {h£ % there are two spht SN-sets X, Y of such that h G b{X) n h{Y)}. 
So, %\ and %\ are disjoint. 

Lemma 11 (i) 3|5^| < \'H\\ + 2\'H\\. 
(U)\S\<^^k + \\'Hl\-\'H'^,\-\\-HU 

Proof: (i) According to the assumption on restricted searching class, V5 G Sq, \b{S)\ > 3. With 
the definition of 7i\,7i2 above, we are done (see Figure [Tj^b)). 



(ii) By Lemma 10 and Claim (i), we have: 

|0| S K + |C>Q I - I «.ol ~ 2l^>2l ^ + sl^ll + 3I "-21 ~ l^ol ~ 2l^>2l- 

We have + in^] = m < k, then < fc- |?^^|. 

So, |5| <p+ l\n^2\ - l^ol - 5l^>2l- ° 

To reach our main result we need to find the relation between |7^2l ^-^d I^qI' l^>2l- ^^^^ 
aim, a function t is introduced. 



4.2 Function t 



For any h £ ^>i, denote by Sh the only split SN-set such that h G a{Sh)- For any h in Tif, denote 
by Ph a path from lca{Sh) to /i. We can always choose for each h in T-L1 a path such that: 
V/ii, /i2 G if Phi, Ph2 have to common vertices u, v such that u is above v, then the two subpaths 
of Phi and Ph2 from u to v are the same. It is easy to see that there exists always such a path for 
each hybrid vertex of Hf. Indeed, if the two subpaths of Ph^ and Ph2 from u to v are not the same, 
then we need only to change the subpath of Ph2 from w to u to be the same as the subpath of Ph^ 
from u to V. The new path Ph2 is always a path from lca{Sh2) ^2- 

V/i G "Hf , we define some sets of hybrid vertices associated with h as follows: 

/o(/i) is the set of hybrid vertices in H.^ different from h on Ph- 

Ii{h) is the set of hybrid vertices h' in T-Lf different from h on 

Ph such that Ph and Ph' have common vertices above h. 



I2{h) is the set of hybrid vertices h' in T-L%2 different from h on 
Ph such that Ph and N[Sh'] have common vertices above h. 
Finally, I{h) = Io{h) U h{h) U hih). 



lca{Sh) 



lca{Sh 




Example 4 For example in the figure on the right, h is a hy- 
brid vertex of Til and the path Ph contains 4 hybrid vertices 
hi, /i2j ^3; ^4- Suppose that /12 £ ^-q, hi,h^ G H-i, /14 G 'H'^- So, 
I{h) = {h2,h^,hi} where /i2 G IqQi), /13 G h{h), /14 G hih). 
hi I{h) because the path Ph^ does not have common vertices 
above hi with Ph. 
Next, the function t is defined as follows. 

Definition 8 For every h G 1-L\, if I{h) = then t{h) = null. Otherwise, let ho be the highest 
hybrid vertex of I{h), so: 

If ho^n^U ■HI2 then t{h) = ho. 

If ho G Tif, then t{h) = t{ho), and we denote h ^ ho. 
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lca{Sh 




lca{Sh 



Example 5 For example, 
in Figure (i) suppose that 
hi,h2 G and G n^, 
then t{h) = /13. 

In Figure (ii), suppose that 
hi, h2, /13 G Tif, and ho G Hq, 
then t{h) = t{hz) and h ^ h^. 
Next, t{h^) = ho, so t{h) = 
t{h3) = ho. 

The 3 following lemmas will be used to prove some properties of the function t. 



\lca{Sh 



dca{ShJ 



(ii) 



Lemma 12 V/i G Tif, VS G S^^, if lca{S) is on Pfi then there exists a hybrid vertex in %q which 
is on Ph and above lca{S). 

Proof: Let s, s' be two children of S such that lca[s, s') = lca{S). Suppose that there is a hybrid 
vertex on P^ below lca{Sh) and above lca{S), then let /iq be the lowest one (Figure 8(a) ). U ho £ Hq 
then we are done. Suppose that ho is not in I-Lq, then there exists the split SN-set Sho = a^^{ho) 
and a child so of such that there is a path /iq -» uq. Let s'q be another child of such that 
lca{so, s'q) is above ho- Since ho is the lowest hybrid vertex on Ph below lca{Sh) and above lca{S), 
then there is no hybrid vertex on P^ which is below ho and above lca{S). Therefore, in order that 
s|soSo is consistent with N, there must exist a hybrid vertex below lca{S) and above u. Similarly 
for s'IsoSq) there must be a hybrid vertex below lca{S) and above u'. It means that a{S) contains 
at least two elements, a contradiction. 





(a) 3 a hybrid vertex ho not in Hq on (b) There is not any hybrid vertex on 



Ph above lca{S) 



Ph above lca{S) 



Figure 8: Proof of Lemma 12 



Hence, there is not any hybrid vertex on Ph below lca{Sh) and above lca{S). Because h G a{Sh 



Uh 



Sh has a child Sh such that either h 
of Sh such that lca{sh,s'y^) = lca{Sh) (Figure 8(b)). 



or there is a path h ^ Uh- Let s^ be another child 
So, there is not any hybrid vertex on any 
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path from lca{Sh) to n'^, because otherwise a{Sh) contains another hybrid vertex different from h, 
contradicting Sh G Sf. It imphes that every path from a vertex above lca{Sh) to u'/^ must pass 
lca{Sh)- Therefore, in order that s\shs'^ is consistent with the network, there must exist a hybrid 
vertex below lca{S) and above u. Similarly for s'\shs'f^, there must exist a hybrid vertex below 
lca{S) and above u' . It means that a(S) contains at least two elements, a contradiction. □ 

Lemma 13 For any h G Sf, let h' be a hybrid vertex on Ph which is in %\. Then there is a hybrid 
vertex o/T-Lq above h' on P^. 

Proof: Because h' G V^, there exist T, T G such that h' G b{T)r\b{T'). By definition, h N[T\ 
and there exist a path ct '■ lca{T) ^ h. Similarly for T' have a path ct'- 

We will prove that there exists a path c from the root to lca{Sh) which is vertex-disjoint with 



N\T\ (Figure 9(a) ). Since h G a{Sh), Sh has a child s such that either u = h oi there is path h ^ u. 
Let s' be another child of Sh such that lca{s,s') = lca{Sh)- There is not any hybrid vertex above 
u' and below lca{Sh) because otherwise a{Sh) contains another hybrid vertex different from h. By 
Lemma [3] (iii), there is a path c' from the root to u' which is vertex disjoint with A^[T]. c' must 
pass lca{Sh) because otherwise there must be a hybrid vertex above u' and below lca{S). Let c be 
the subpath of c' from the root to lca{Sh), so c is also vertex-disjoint with A^[r]. 

The 3 paths CT,CT'-,Ph pass h' while the indegree of /i' is 2, so at least two among them have 
common vertices above h' . If these two paths are ct,ct', then by the same argument with the 
proof of Lemma [s] where T, T' correspond to Y, Z, we deduce a contradiction. So Ps has common 
vertices with either ct or cy. Suppose that it is ct, we have the following cases: 




- lca(T) is on Ph'- then by applying Lemma 12 with S = T, there is a hybrid vertex of T-Lq on 
Ph which is above lca{T), so above h' , we are done. 



- Ct intersects with Ph at a hybrid vertex h' above h' (Figure 9(b)). As proved above, there is 
a path c from the root to lca{Sh) which is vertex-disjoint with A^[T]. Let P' be the subpath of Ph 
from lca(S) to h' , and C be the path consisting of c and P' . We will prove that lca(T) must be 
on P' . Because ct is a path lca(T) h, every path from the root to h' must have common vertex 
with A''[r]. Since N[T] does not contain any hybrid vertex, we deduce from the later that every 
path from the root to h' must pass lca{T). In other words, C must pass lca{T). We know that 
lca(T) can not be on c because this path is vertex disjoint with N[T]. So lca{T) must be on P' . 
Hence, we return to the previous case. 
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- lca{Sh) is on ct (Figure 9(c)): As proved above, there is a path c from the root to lca{Sh) 
which is vertex-disjoint with A^[T]. This path intersects with ct at a hybrid vertex h' above lca{Sh)- 
The subpath of c from the root to h' is also vertex disjoint with A^[T], contradicting the fact that 
Ct is a path lca{T) ^ h. □ 

Lemma 14 (i) V/ii, /12 G Tif if hi /12 then G ^? n ^5- 

(ii) V/ii, /i2 G ^1 such that hi ^ /i2, if hi — )• /i'^ and /12 — ^ ^2 ^/^en /i'^^ 7^ h'2. 

/i2 G "Hi, and /12 is the highest hybrid vertex of I{hi) on P/j^. Suppose 



Proof: (i) By definition 
that /i2 G n '^2; then according to Lemma 
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there is a hybrid vertex of "Hg on above /i2. 



So, this hybrid vertex is in Io{hi). It is a contradiction because /i2 must be the highest hybrid 
vertex of I (hi). Hence, /12 G "H? fl 



lca{Shi) \lca{Sho) 



\ca{Sh2) 




lca{Sh^) 



lca{Sho) 



(a) Pfij and do not have any (b) P^^ and Ph2 intersect at a hybrid ver- 
common vertex above ho tex h' above ho 




Figure 10: Proof of Lemma 14 (ii) 



(ii) Suppose that h[ = h^ = h^. So, the 2 paths Ph^, Ph2 pass /iq- There are the following cases: 

- lca{Sh2) is on P^^^, then by Lemma 12, there is a hybrid vertex h' of on P^^ above lca{Sh2)- 
It means that h' G Io{hi) and is above /iq, contradicting /iq being the highest hybrid vertex of 
I {hi). Similarly for the case lca{Shi) on P^^- 

- Phi and Ph2 do not have any common vertex above /iq (Figure 10(a)). We have hi — )• ho, 
so P/ij intersects with P^q at a hybrid vertex /i^ above /iq- Similarly, P/j^ intersects with P^^ at 
a hybrid vertex /i^ above /lo- Suppose that h^ is above /i^, the case where h"^ is above h^ will be 
treated similarly. Note that the two paths P/^p and P/j^ have two common vertices: h^ and /iq- 
Then, by the property that we impose on Ph for any h in Hf, the two subpaths of Pho and P/^ 
from /i^ to /iQ are the same, h'^ is a vertex on the subpath of Pho from /i^ to ho, so it must be also 
on the subpath of Ph^ from h^ to /iq. However, it means that /i^ is a common vertex above ho of 
P/i^ and P/12, a contradiction. 

- P/j^ and P/12 intersect at a hybrid vertex h' above ho (Figure [lO(b)p . So h' /(/ii) U /(/i2) 
because /iq is the highest hybrid vertex of I{hi) and of I{h2). We deduce that h' G ^"i- Let 
5' = a~^{h'). It is easy to se that N[S'] must have common vertices above h' with either Ph^ or 
P/ij. So, /i' is either in I {hi) or in 1(^2) > a contradiction. 

Therefore, /i'^ 7^ ^2- 1^ 
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Lemma 15 The function t has the following properties: 

(i) yh £ Hf n 7^2' ^(^) ^-s defined and equal to a hybrid vertex of {%q U %%2) 
(a) V/iQ G T-Lq n 'H'I, t~^(/io) contains at most 2 hybrid vertices ofHf n 'H2- 
(Hi) V/iQ G n "Hp t^-'^(/io) contains at most 1 hybrid vertex of Sf {^T-L^. 



Proof: (i) Let /i G "^^^ n T^gi suppose that we have a chain of spht SN-sets h ^ hi 
defined as in Definition [8l 

Firstly, we will prove that hi ^ h for any i. It is obvious that h ^ hi because h — 
li = h for a certain i > 1, then hi-i 



hr. 



that hi 



h. By Lemma 
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/ii. Suppose 
i), /i n '?^2) ^ contradiction. 



Next, we will prove that hi 7^ /ij for any i, j = 1, . . . ,m. Suppose otherwise, let i be the smallest 
index such that there exists j greater than i and hi = hj = h' . If i > 1, then we have hi 
and hj-i 

a contradiction with Lemma 



hi 

hj. However, hi-i because i is the smallest index having this property. So it is 

If i = 1, we have h ^ hi and hj-i — t- hj. However, as proved 
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recently, h 7^ /ij-i, so it is a contradiction with Lemma 

Hence, the recursive calls in Definition [8] do not loop, and since the number of split SN-Sets is 
finite, t(h) is always defined. 

Now, we show that yh £n^n n\, t{h) / null. By Lemma [l3| the fact h £ni<rM-L\ deduces 
that /o(/i) / 0, i.e. I{S) / 0. Let ho be the highest vertex of /(/i), if /iq G h{h) U h{h), then by 
definition t{S) = /lo / null. Suppose that /iq G Ii{h), and to define t{h) we pass a chain of other 
split SN-sets: /i —)• /ii —)••••—)• /ij and suppose that = 0, i.e. t{hi) = null. Because — )• /ij, 
the two paths Phi^i and Phi pass /ij and have common vertices above hi. There are the following 
cases: 

- lca{ShJ is on Phi_i- Then by Lemma 12, there is a hybrid vertex of 1-Lq on Phi_i, above 
lca{Shi), so above hi. It means that this hybrid vertex is in I{hi-i) and above hi, contradicting hi 
being the highest hybrid vertex in 
-lca{Sh,^^] 
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there is a hybrid vertex of on Ph^ above lca{Shi 



lca{Si 



lca{S') 




is on Ph- . Then by Lemma 
It means that this hybrid vertex is in I {hi), contradicting I[h 

- Ph- and Phi^i intersect at a hybrid vertex h' above hi (figure 
on the right), h' is not in because hi is the highest 

vertex of /(/ij^i). We deduce that h' G Let S' = a-^{h'), 

so A^[5'] must have common vertices above h' with either Psi-i 
or Pg. . If it has common vertices with Phi_i then h' G 
contradicting hi being the highest vertex of If it has 

common vertices with Ph^ then h' G I (hi), i.e. I {hi) 7^ 0, a 
contradiction. 

So t{hi) 7^ null or t{h) 7^ null. Let t{hi) = ho, we deduce from definition of t that ho is a hybrid 
vertex of Tio U 'H%2- ^0 G 7^2' then by Lemma 13, there is a hybrid vertex of T-Lq on P/j. above 
/iQ. Then, this hybrid vertex is in Io{hi), contradicting ho being the highest vertex of I {hi). Hence, 

ho nl, i.e. t{h) G {Ti^ u n^^) n 

(ii) Let /iQ be a hybrid vertex in T-Lq n Suppose that there are 3 distinct hybrid vertices 

hi, h2, h-s of nf n nl. such that t{hi) = t{h2) = t{hz) = ho. 

Suppose that before reaching ho, each one passes a chain of split SN-sets: 
/ii /i'^, /i2 /i2, /i3 h'^, and t{h'^) = t{h'2) = t{h'^) 

highest hybrid vertex of I{h'i), /(/12), -^(/I's)- 



ho, i.e. ho is the 
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Figure 11: ho^n^nH^ 



Figure 12: ho G n%2 n n'l 



By Lemma 14 (i), in each chain, only the first hybrid vertex, i.e. hi, /i2, /13, is in T-Lf n ?^2) the 



others are not. Because hi, h2, /13 are distinct, by Lemma 14 (ii) these chains do not have common 
spht SN-sets. In other words, h[, h'2, h'^ are also distinct. So, we need only to show that ho can 
not be the highest hybrid vertex of I{hi), I{h'2), I{h'^). The 3 paths P/j/ , P^, Pj^'^ pass ho, then 
among them there are at least 2, for example P^'^ and P^'^, have common vertices above ho- There 
are the following cases: 

- If lca{Sy^) is on Py^, then by Lemma 12 there is a hybrid vertex of on P^/^, above lca{S^'^), 



i.e above ho- Then, this hybrid vertex is in lo{h[). It is a contradiction because ho is supposed to 
be the highest hybrid vertex of I{h'i). Similarly for the case where lca{Sf^i ) is on P^ 



h' 



If P/i'^ and Py^ intersect at a hybrid vertex h' above ho (Figure 11), then h' I{h'i) U /(/12 



So, h' G 'H%i. Let S' = a ^{h'), then A^[5'] must have common vertices above h' with either P/^/^, 
or P^^. So, either h' G /(/i'l) or h £ I{h'2), a contradiction. 

(iii) Let ho be a hybrid vertex in 'H%2 i^^i) ^'^d 5o = a^^{ho)- Suppose that there are 2 hybrid 
vertices hi, /i2 of %1 n such that t{hi) = t(/i2) = ho- 

Suppose that before reaching ho, each one passes a chain of split SN-sets: 



h'l, /i2 



h' 
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and = t{h'2) = ho, i.e. ho is the highest hybrid vertex 

ii), hi, /i2 are the only hybrid vertices of Sf Pi T-L^ in these 
h'2 are distinct. So, we have to only show that ho can not be the highest hybrid 



/ii — > • • • - 

of I{h[), I {h'2). Similarly with Claim 
2 chains and /i'^ 

vertex of I {h'^) , I {h'2) ■ Suppose otherwise, because P^^^, P^'^ pass ho, we have the following cases. 

- If lca{Sfi'^) is on P/j/^, or lca{Sfii^) is on P^^^, or Pj^i^,Pi^i^ intersect at a hybrid vertex h' above 
ho, then by the same argument as that of Claim (ii), we deduce contradictions. 

- The last case is the case where Py^ and Py^ intersect at ho (Figure 12). Let 5*0 = a~-'^(/io), then 
A'^IS'o] must have common vertices above ho with either P/j/^ or P/j/^. It means that either ho G I{h'i) 
or ho G I{h'2), a contradiction. □ 



Lemma 16 \'H\\ 



<2m + \n%2\ 



Proof: By Lemma 15 

^ \n'^r\nl\ + \'Hfn'H2 



we deduce that \nf D ni\ < 2|HS n n\ 

b 

>2 



+ 



\n'i,2nn'i\ (Figure gc)) 



+ 



n n^2\ < I^S n ^2! + 2|^S n + 1?^!^2 n + n%2^'^2\ 



1^2! < 2|^S| 



+ 



l^>2l 



□ 
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4.3 A bound 



Theorem 1 If T is consistent with a level-k network N , then there exists a level-k network N' 
with the same number of hybrid vertices of N, which has at most [|A;J split SN-sets. 

Proof: According to Lemmas |11| (ii), 16 we have: 



\S\ < ^k+ llH^l ~ l^ol ~ 2l^>2l - 3^ ~ gl^2l ^ 1^- 

Therefore, by Lemma [9] and the assumption stated at the beginning of the section, if T is 
consistent with a level-fc network A^, then there exists a level-fc network A^' with the same number 
of hybrid vertices of N, which has at most [|A;J split SN-sets. □ 

Remark: Up to now we do not have any example achieving this bound. Therefore it is possible 
that the bound [|/cj is not optimal, in fact each time we were able to construct an example of a 
network which reaches this bound, we can modify it into another network which has a smaller 
number of split SN-sets without changing the number of hybrid vertices. Especially, for the cases 
of < 8, it can be checked case by case that the number of split SN-sets in the restricted networks 
class is bounded by k. 

5 Constructing a minimum phylogenetic network. 



Data: A dense triplet set T on the set C of n species, and a fixed k 
Result: A minimum level- /c network consistent with T, if there exists one 

1 Compute the SN-tree of T; 

2 For every singleton u of let be the network containing only one leaf u; 

3 for (each non-singleton SN-set A ofT, in non- decreasing order of size) do 
V = T\A]N'^ = nuU;min = 0; 



4 
5 
6 
7 
8 
9 
10 
11 
12 

13 
14 



for (each set C of at most [|A;J disjoint non-singleton descendants of A) do 
V ^ the partition of A inferred from C; 

N'^ ^ a level-fc network consistent with 7" and has V as its partition; 
rain ^ the number of hybrid vertices in N^] 

for (each level-k network Na consistent with T and has V as its partition) do 
if ( the number of hybrid vertices of Na < min ) then 
min ^ the number of hybrid vertices of A^^; 
_N^ = Na; 

if (N"^ = null ) then 
1^ return null; 



15 return A^^^ 



Algorithm 1: Constructing a minimum level-k phylogenetic network 

Theorem 2 For every T set of dense triplets and a fixed k, algorithm^takes time 0{\l~\''~^^n^^^^ 
and return a minimum level-k network consistent with T if there is any. 
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Proof: The correctness of Algorithm [T] 

This algorithm consists of constructing a network on each SN-set following a non-decreasing order 
of size (the loop For at line 3). So for each iteration corresponding to a SN-set A, a minimum 
level-A; network on each SN-set smaller than A is already constructed. Remark that if is a 
minimum level-A; network consistent with T then each Ni is a minimum level-A; network consistent 
with 7~|/(A^j). So, by constructing for each A a minimum level-A; network consistent with TIA, 
finally is a minimum level-A; network consistent with 7~. 

For each SN-set A, we must find a partition of A which is the one in a minimum level-A; network 
consistent with T\A. By Lemma [2| each partition is determined by a set of split SN-sets, and each 
one is a non-singleton descendant of A. By Theorem [T| we need only to check all the possible sets 
of descendants of A having cardinality at most [|A;J. That is what the loop For at line 5 does. 
Next, for each partition V inferred from each set of split SN-sets, the algorithm checks all level-A; 
networks which are consistent with T and have T as their partition, and then chooses the one which 
contains the minimum number of hybrid vertices. The finding network is stocked in N"^. That is 
what the loop For at line 9 does. If = null, i.e there is not any level-A; network consistent with 
7~|j4, then we conclude that there is not any level-A; network consistent with T- 

The complexity of Algorithm [l] 

- The SN-tree of T can be computed in O(n^) (using the algorithm in [9]). 

- The first loop For: There are at most 0(n) non-singleton SN-sets, so there are at most 0{n) 
constructions. 

- The second loop For repeats at most n'-s'^J times because A has at most 0(n) non-singletons 
descendants, so there are at most n'-s'^J possibilities for C. 

- In the body of the second loop For: based on Lemma [T| to construct a level-A; network 
consistent with T' and has V as its partition, there are two steps: First, we compute a level-A; 
simple network Ns consistent with T'\^V. Then, we replace each leaf of Ns by the subnetwork 
already found on the corresponding part of V. According to |19j . we can compute all level-A; simple 
networks consistent with T'W in time 0{\T''VV\^~^^), or in time 0{\T\^~^^). The times needed to 
compute the partition V of A from the set of split SN-sets C (Lemma [2]) , to replace each leaf of Ns 
by a subnetwork, are negligible compared to the time for computing all the simple networks. So 
this loop takes time 0{\T\''^^). 

Hence, the total complexity is 0(|T|^^"'^nl-"3"J+i). □ 

Corollary 1 For every T set of dense triplets, it is polynomial to compute a minimum level phy- 
logenetic network consistent with T which minimizes the number of hybrid vertices if the minimum 
level is restricted. 

Proof: It is easy to see that we can slightly modify Algorithm [T] so that it returns a minimum level 
network. Indeed, we try to construct a minimum level-i network consistent with T if there is any, 
in increasing order of i. Then, the first value of i that the algorithm returns a network corresponds 
to the minimum level of the networks consistent with T- So the complexity is 0(A;|7'|'^'*'"^ral-3'^J+^) 
where k is the minimum level. □ 
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6 Conclusion and perspectives 



We proved that for any k fixed, we can construct a level-k network having the minimum number of 
hybrid vertices, if there exists one, in polynomial time. Furthermore, if the minimum level of the 
networks consistent with T is restricted, we can also construct one in polynomial time. 

|17j implemented the algorithm for level-2 networks and applied it to some part of yeast genomic 
data. However, on a bigger data set, there does not exist any level-2 networks. So with our result, 
one could expect to practically find solution on real data, for small values of k (as for example with 
k < 5). 

For simple networks, \10\ l9l [T7] showed, there are efficient algorithms for level-1, level-2. How- 
ever, for general level-fe networks, there exists only a 0(|7'|'''''^) algorithm [19j . Any improvement 
for this problem, even on small levels, will allow us to implement more efficiently our algorithm. 

For any triplet set T we can define its Treerank{T) as the minimum k for which there exists 
a level-A; network representing T. This measures the distance from T to be consistent with a tree. 
This distance is measured in terms of the number of hybrid vertices. We proved in this paper that for 
dense triplet sets, and for any fixed k, checking if Treerank{T) < k can be done in polynomial time. 
Furthermore [16J proved the NP-hardness of the computation of the Treerank{T) < k. Therefore 
this parameter has a similar behavior on phylogenetic networks that treewidth or undirected graphs. 
Perhaps this analogy could yield further interesting structural insights as shown in [5] with a nice 
recursive construction. 

Another question is under which conditions on the triplet set T there is only one network 
consistent with 7~. It would be also interesting to know whether the condition of density on the 
triplet set can be relaxed so that there is still a polynomial algorithm to construct a consistent 
level-fc network, if there is any, with any fixed k. 
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